Lars Kristiansen, Eirik Pettersen and Marius Jørgensen
17/12/2020
project.py to generate the the figures.
$ python3 project.py/figures folder.With Covid-19 still spreading worldwide, we were interested in looking at traffic data to see if the pandemic had made an impact on the amount of traffic accidents occurring each day. As people are recommended to stay at home and travel less, we suspect that there will be less vehicles on the road which would reduce the chance of accidents. To narrow down the scope of the project, we decided to focus on the United States, as it is the country with the most confirmed corona cases. The aim of the project is to uncover if there exists a correlation between Covid-19 and traffic accidents in the US.
No particular work inspired us to choose this topic, we were simply curious if there was a correlation. It seemed likely to us that corona would affect the number of vehicles on the road, which in turn would affect the probability of traffic accidents occurring. While traffic accidents are random events, having more people driving means that there are more opportunities for this random event to occur.
The initial question we wanted to answer was if there was a correlation between the spread of Covid-19 in the US and the number of traffic accidents. As the project progressed, we decided to look at Covid-19 positives and traffic accidents in each US state to give us additional data to compare states. We wanted to see if there were different outcomes in individual states, and compare each state to the whole country. We also decided to look at when each state issued a stay-at-home order to examine if this had an effect on traffic accidents.
Both the dataset for traffic accidents, and the dataset for covid were found on Kaggle. The traffic dataset contains data spanning from 2016 until June 2020, with various information about each accident, such as date, time, severity, source, longitude/latitude and state. Each row in the dataset represents one accident. From the traffic dataset we only used information about which state in the USA the accident occured in, and date for the accident. Additional information such as accident severity has not been used. State names were represented by their 2-letter initials, so we decided to convert them into their full names. To get the number of accidents in the state, we simply count each row that has the matching state and date we are interested in.
The dataset for Covid-19 contains data from the start of the pandemic in the USA, until the start of December 2020. Like with the traffic dataset, the Covid dataset contains information that is not relevant for our use. The Covid data contained positive test results in a cumulative count, which we decided to convert into daily cases. Which state the Covid numbers come from, and the date is the most relevant for our case.
The traffic dataset only contained data up to june 2020, which limited the potential of our analysis. It would be difficult to make any decisive conclusions on if traffic was impacted by covid or not. We contacted the creator to ask if he could provide us with an updated dataset for 2020, and he was able to send it to us. The updated set contained two datasets from different sources (Bing and Mapquest) with accidents from June 2020 until December 2020, so we had to combine the two of them. The estimated amount of duplicates by combining the two sets is estimated to be less than 1% according to the creator of the dataset. Therefore, we decided not to do any pre-processing to remove those possible duplicates.
We found that lockdowns and covid related restrictions could be relevant. So we manually gathered information on when and if a state issued a stay-at-home order (Only leave home if necessary), and when and if the order has ended. Two sources were used for this. (https://eu.usatoday.com/storytelling/coronavirus-reopening-america-map/ and https://en.wikipedia.org/wiki/U.S._state_and_local_government_responses_to_the_COVID-19_pandemic). For all datasets, we formatted dates to be in the same format, and state names have been set equal.
We started out by visualizing traffic and covid data as a line graph with a double y-axis. One y-axis for the number of traffic accidents each day, and the second y-axis for the number of confirmed covid-19 positive cases. For traffic accidents, we plotted the data for years 2018 to 2020 to see if there was a difference in the years prior to the outbreak. The traffic data was highly volatile, so a 21-day moving average was used to smooth out the values for each day. The volatility is caused by the fact that weekdays, weekends and holidays have different impacts on the number of vehicles on the road, and the fact that traffic accidents are randomly occurring events. We experimented with various window sizes, but the highly volatile data required a large moving average to compensate. Using a 21 day moving average revealed any steady increases or decreases in the data and kept the lines smoother. The double y-axis was removed and the traffic and covid data was split into separate subplots, since the double y-axis proved to be confusing to read (see figure 1). The traffic subplot displays traffic for 2018, 2019 and 2020 as 3 lines on a graph, with days on the x-axis and number of accidents each day on the y-axis. The covid-19 subplot is a barplot which displays new covid positives each day on the y-axis and days on the x-axis as a bar plot. The two subplots share the x-axis, so the covid data can be visually compared with the traffic data. As a final addition, the stay-at-home data was used to mark the timeframe where the stay-at-home order began and ended for each state.
Figure 1: A plot of New York with the original double y-axis design.
Traffic accidents start out noticeably higher than previous years even before Covid-19 reaches the country. Previous years started with around 2200 accidents a day, while the year 2020 had around 2900 accidents a day. The numbers stay stable until the middle of May this year, where accidents start to rise while previous years slowly descend. The first Covid-19 positive case was registered in the middle of March, but at this time there does not seem to be any noticeable change in traffic for 2020. Traffic starts dropping from almost 3600 a day to 2100 a day around the middle of July, where infections reach a new record peak in the US. As corona starts to slightly slow down its infection rate in august/september, the traffic accident rate for 2020 starts to steadily increase. However, this pattern from august to december in 2020 seems to match the pattern from previous years. Corona reaches an all-time peak in december, but the traffic accident rate still matches previous years around this time.
Accidents in 2020 start out almost 3 times higher than previous years in the month of January. California issues a stay-at-home order in the middle of March, which is immediately followed by daily traffic accidents reducing from over 1000 occurrences daily to about 700 cases in early april. Covid-19 infection rate reaches a new peak around the middle of July, where traffic starts dropping from 900 to 350 accidents a day. From September to December the accident rate in 2020 and 2018 seems to be similar, with 2019 being the outlier.
Rate of traffic accidents in 2020 reached the lowest it has ever been in the last 3 years during the stay-at-home order which lasted through April. Accidents were generally lower than the previous years until July where the patterns from each year starts to match. The traffic in 2020 does start to decrease below previous years in December, when Covid is on the rise.
Traffic for all 3 years behaved in a similar pattern, with a slight reduction in accident rate for 2020 during the stay-at-home order which lasted from April to May. In June, accident numbers started to rise, while Covid-19 positives also began to rise. Covid-19 reached a peak of 15000 daily cases in the middle of July, and traffic accident occurrences are noticeably higher than previous years. In September, the accidents for 2020 starts to increase while 2018 and 2019 stays stable. Florida traffic in 2020 started out with a similar pattern as previous years, but after June begins, the accidents starts to noticeably become higher than 2018 and 2019.
During the stay-at-home order which lasted from mid March to June, traffic accidents for 2020 in Illinois seems to be increasing. In the middle of June, Covid-19 infection rate begins to slow down, but 2020 traffic keeps increasing until the beginning of July, making it twice as high compared to 2018 and 2019. The accident numbers begin to stabilize around August, but starts to increase again until it reaches a new peak around November. Covid-19 reaches a new peak in mid November, and traffic accidents around this time are about 2 times higher than previous years.
The accident rate for 2020 starts out normal, but begins to increase around the stay-at-home order which lasted from mid March to mid May. Covid-19 infection rate begins to sharply increase around this time, but starts to calm down around May. In mid July, traffic accidents decrease to numbers which closely resemble previous years. Covid numbers begin to increase again around November, but the pattern for traffic accidents behave in a similar fashion to previous years.
In Ohio, the number of traffic accidents in 2020 starts off in the same range as the two previous years. When the stay at home order starts, there is a huge increase in daily traffic accidents which keeps increasing in the stay at home period. In the start of July, one month after the stay at home period ended, the number of daily traffic accidents drops down to the same rate as the previous two years.
In Pennsylvania, the number of traffic accidents ranges from around 50 to 70 daily reported accidents throughout 2018 and 2019. However, in 2020 when the first Covid positives are reported, there is a huge increase in traffic accidents compared to the two previous years. Even during the stay at home order, the number of daily reported traffic accidents keeps increasing. From May 2020 to the start of December 2020 the number of traffic accidents has slowly decreased to now being in the same range as in the two previously reported years.